In this script I’ll be looking at how to analyse the outputs from the MCMCglmm models to get the something about the elaboration/exploration stories.

I’m going to look at it using two approaches: * the blob-wise one where I’ll look at the differences between whole blobs (ellipses) in terms of elaboration/exploration (Robinson & Beckerman style); * and the tip-wise one where I’ll look at the elaboration/exploration score of the tips on the tree (cf. their blobs) relative to the whole phylogeny and to their respective blobs (clades).

We can do all that with the internal package "beer" that can be install via github (if you have access to the repo):

devtools::install_github("TGuillerme/elaboration_exploration_bird_beaks/beer")
library(beer)

Data

I’m going to focus on the data from Gavin and especially the models 5 and 6 that are:

## Loading the correct models
data("model_list")
model_phylo1_clade3 <- model_list[[4]]
model_phylo4 <- model_list[[7]]

And here’s what the trait space looks like:

## Loading the data
data("morphdat")
## Plotting it
colour_vector <- c("orange", "blue", "darkgreen")
plot.space(morphdat,
           col = colour_vector,
           levels = morphdat$clade,
           xlab = "PC1 (90.5%)",
           ylab = "PC1 (6.86%)")
legend("topleft", legend = levels(morphdat$clade), col = colour_vector, pch = 19)

Note that the PC% are from the whole PC in Gavin’s example.

Blob-wise approach

Here we’re basically comparing multidimensional ellipses (here 3D ones but I think we should push it to more dimensions!).

Phylo + clade model

First we can visualise some of these ellipses (100, randomly drawn from the posterior):

## Selecting 100 random covariance matrices from the MCMCglmm
covar_matrices <- get.covar(model_phylo1_clade3, n = 100)

## Plotting the results
plot.space(morphdat,
           col = colour_vector,
           levels = morphdat$clade,
           xlab = "PC1 (90.5%)",
           ylab = "PC1 (6.86%)")
plot.ellipses(covar_matrices, col = c("grey",colour_vector),
              add = TRUE)

Because this is really messy, we can centre these ellipses on the groups ellipses average centres:

plot.space(morphdat,
           col = colour_vector,
           levels = morphdat$clade,
           xlab = "PC1 (90.5%)",
           ylab = "PC1 (6.86%)")
plot.ellipses(covar_matrices, centre = mean,
              col = c("grey",colour_vector),
              add = TRUE)

We can then measure different stuff from these ellipses (e.g. going full statistics from Robinson & Beckerman).

However, here I’m just going to focus on the elaboration/exploration of these ellipses. For the three clades, I’m going to measure:

  • the angle of the clade’s ellipse to the general one (which will be their degree of exploration);
  • their scaled projection (where the ellipses all start from the point 0,0): showing how much the clade elaborates on the major phylo axis;
  • and their scaled rejection (how much they explore).

For these three metrics I expect:

  • the gulls and plovers to elaborate quiet a lot (angle ~90) compared to the sandpipers (angle ~90)
  • the gulls to explore the more followed by the plovers and then the sandpipers
  • the sandpipers to ellaborate the more followed by the gulls and then the plovers

Note that the three metrics are measured independently of the ellipse position in space and are measured in 3D.

## Get all the phylo main axes
level_centred_major_axes <- get.axes(covar_matrices, centre = mean)
uncentered_major_axes <- get.axes(covar_matrices, centre = "intercept")

And this is what it looks like for the uncentred ones:

plot(morphdat[, c(1,2)], pch = 19, cex = 0.5,
     col = colour_vector[morphdat$clade],
     xlab = "PC1 (90.5%)",
     ylab = "PC1 (6.86%)")
plot.ellipses(covar_matrices, centre = "intercept",
              col = c("grey",colour_vector),
              add = TRUE)
plot.axes(uncentered_major_axes,
         col = c("grey",colour_vector),
         add = TRUE)

And for the centred ones:

plot.space(morphdat,
           col = colour_vector,
           levels = morphdat$clade,
           xlab = "PC1 (90.5%)",
           ylab = "PC1 (6.86%)")
plot.ellipses(covar_matrices, centre = mean,
              col = c("grey",colour_vector),
              add = TRUE)
plot.axes(level_centred_major_axes,
         col = c("grey",colour_vector),
         add = TRUE)

Note that the major axes look slightly off for some of the ellipses. I think this is due to the difference in aspect ratio in both plots that exagerates vertical positions and deforms the ellipses:

plot.space(morphdat,
           col = colour_vector,
           levels = morphdat$clade,
           xlab = "PC1 (90.5%)",
           ylab = "PC1 (6.86%)",
           main = "Major axes of the ellipses\n(aspect ratio = 1)",
           xlim = c(-1.5, 1.5),
           ylim = c(-1.5, 1.5))
plot.ellipses(covar_matrices, centre = mean,
              col = c("grey",colour_vector),
              add = TRUE)
plot.axes(level_centred_major_axes,
         col = c("grey",colour_vector),
         add = TRUE)

This is an example with just the major axes (projected in 2D):

plot.axes(uncentered_major_axes,
         col = c("grey",colour_vector),
         add = FALSE, main = "Major axes (un-centred)")

We can then compare these vectors per blobs. We in order to facilitate the vectors comparisons, we will translate the groups vectors (clade ones) on the origin of the phylo vector.

## Get all the angles and stuff
group_results <- analyses.group(uncentered_major_axes, base = "animal")
names(group_results) <- levels(morphdat$clade)
## Plot the blob wise metrics
plot.analyses.group(group_results, col = colour_vector)

Methods bits draft

The model

We run a MCMCglmm with different levels.

The mini chains

We used a mini chain approach to:

  • accounted for phylogenetic uncertainty [healy.et.al]
  • accounted for global parameter estimates uncertainty (local vs. global peak)
  • greatly speeding the calculations through high parallelisation

We then combine the mini chains

Exploration and elaboration analyses

For each model, we randomly draw \(N\) variance covariance matrices estimates. We then used them to analyse the exploration/elaboration component at the species level and at the clade level indepently.

We extracted the major axes of each selected covariance matrices using blablbalba

At the clade level

We then translated each set of major axes so that they align with each other (i.e. translated the clade major axes origin onto the global major axes origin).

We then calculated the angle between the clade major axes and the global major axes as well as their projection and rejections.

At the species level

Here we only projected the species coordinates on either:

  • the global major axes to measure the global elaboration/exploration
  • the clade major axes to measure the nested elaboration/exploration

Tip-wise approach

For the tip wise approach, the method is pretty straightforward from what I’ve done previously: we just measure the projection/rejection (elaboration/exploration) on each for each axis that we’ve drawn above. We thus end up with 100 projection/rejection score for each element (tip) in the tree for:

  • Each element against the phylo axes
  • Each element against their clade axes
## Global per tip analyses
tip_results_global <- analyses.tip(data = morphdat[, c(1:3)], axes = uncentered_major_axes$animal)

## Plot results
plot.analyses.tip(tip_results_global, cols = colour_vector, group = split(rownames(morphdat), f = morphdat$clade))

But maybe it’s easier to boxplot that to get the info by clade

boxplot.proj.rej <- function(point_list, data, cols) {
    ## Get the classifier
    classifier <- data$clade    
    sort.clade.rej <- function(data, classifier) {
        return(data.frame("points" = data$rejection,
                          "classifier" = classifier))
    }
    sort.clade.proj <- function(data, classifier) {
        return(data.frame("points" = data$projection,
                          "classifier" = classifier))
    }
    ## Sort the data per clade
    rejections <- do.call(rbind, lapply(point_list, sort.clade.rej, classifier))
    projections <- do.call(rbind, lapply(point_list, sort.clade.proj, classifier))
    par(mfrow = c(2,1))
    boxplot(points ~ classifier, data = projections, col = cols, main = "Phylo projections\n(projected on the whole phylo effect)")
    boxplot(points ~ classifier, data = rejections, rejections, col = cols, main = "Phylo rejections")
}

boxplot.proj.rej(phylo_results_uncentered, cols = colour_vector, data = morphdat)

And now we can see the differences for the within clades (i.e. the elaboration/exploration within each clade). The idea here is to distinguish between extra special taxa. For example, in mammals the platypus and the naked mole rat are special, however, the platypus is actually not special among platypuses whereas the naked mole rat is special among rodents!

## The within clades explo/elabo
gulls_results_uncentered <- lapply(uncentered_major_axes$cladegulls, lapply.projections, trait_space = morphdat[which(morphdat$clade == "gulls"), c(1,2,3)])
plover_results_uncentered <- lapply(uncentered_major_axes$cladeplovers, lapply.projections, trait_space = morphdat[which(morphdat$clade == "plovers"), c(1,2,3)])
sandpipers_results_uncentered <- lapply(uncentered_major_axes$cladesandpipers, lapply.projections, trait_space = morphdat[which(morphdat$clade == "sandpipers"), c(1,2,3)])



projection_list <- list("gulls"   = unlist(lapply(gulls_results_uncentered, `[[`, "projection")),
             "plovers" = unlist(lapply(plover_results_uncentered, `[[`, "projection")),
             "sandpipers" = unlist(lapply(sandpipers_results_uncentered, `[[`, "projection")))


rejection_list <- list("gulls"   = unlist(lapply(gulls_results_uncentered, `[[`, "rejection")),
             "plovers" = unlist(lapply(plover_results_uncentered, `[[`, "rejection")),
             "sandpipers" = unlist(lapply(sandpipers_results_uncentered, `[[`, "rejection")))
par(mfrow = c(2,1))
boxplot(projection_list, main = "projection within clades\n(projected on the clade specific effect)", col  = colour_vector)
boxplot(rejection_list, main = "rejection within clades", col  = colour_vector)

They seem a bit off as well. Maybe it’ll be worth centering their axis on their trait space centroid.

TODO.

Ideas for the MCMC

  1. Check on a big global model: a) how long it takes to get 1000 gen. (or so) and b) how long the burnin phase takes.
  2. Use info 1a and 1b to make design “mini chains” (i.e. chains that just go past the burnin and run for an extra ~1000 gen.); the conservative assumption is that post-burnin, the chain has reach a local optimum that exists within the distribution of the global (true) optimum.
  3. Run 3 mini chains per tree on 100 trees (resulting in 300 mini chains) to get phylogenetic uncertainty into account. For each triplet of models, discard any that’s way off (Natalie thesis style for morpho measurements). This results in N models that have reach N local optimums while taking into account phylo uncertainty.
  4. Discard any of the N models that is way off (outliers). We assume that these outliers were chains stuck on local optimums that were outside to global optimum distribution. We then end up with M (M<=N) models that are the most conservative representation of the model because they take into account phylo uncertainty AND optimum uncertainty (i.e. if running one big model, you have to make the assumption that the local optimum reached equal the global optimum, here we can skip that assumption by saying that our distribution of local optimums contains the global optimum).
  5. Combine the results of all the MCMC into one BIG posterior distribution; randomly sample n (=Mx5?) samples (or any other number) from that distribution and work from there! (if we don’t discard any models M = 300 mini chains of >=1000 samples each, thus the sample represents ~5% of all the data).